Proceedings of the 17'*' Linux Audio Conference (LAC-19), CCRMA, Stanford University, USA, March 23-26, 2019 


ISOCHRONOUS CONTROL + AUDIO STREAMS FOR ACOUSTIC INTERFACES 


Max Neupert 


Clemens Wegener 


The Center for Haptic Audio Interaction Research 
Weimar, Germany 
max@chair.audio 


The Center for Haptic Audio Interaction Research 
Weimar, Germany 
clemens@chair.audio 


ABSTRACT 

An acoustic interface (also: hybrid controller) is presented. By tap¬ 
ping, scratching, rubbing, bowing, etc. on the surface, excitation 
signals for digital resonators (waveguides, lumped models, modal 
synthesis and sample convolution) are created in synchronicity with 
augmenting control signals. It is described how a direct acoustic 
excitation delivers an intimate and intuitive interaction. Questions 
are raised about which protocols to use for isochronous audio and 
control transmission as well as file formats. Standardization of such 
protocols is desirable for future hybrid instruments with analog in¬ 
terfaces. A first step towards standardization is made with the publi¬ 
cation of our implementation. 

1. INTRODUCTION 

Recent developments in the musical instrument controller market 
follow the demand for more expressive and continuous control. At 
the same time more computing power allows for expensive synthesis 
methods so that more parameters can be made use of as a continuous 
stream of control data in several degrees of freedom. 

1.1. Keys or silicone? 

A keyboard of the MIDI standard is generally sufficient to gener¬ 
ate the parameters for a simple electronic representation of a piano. 
Mod-wheel and pitch-bend only extended this affordance mildly. For 
instruments with a continuous articulation like wind and string in¬ 
struments the single parameter velocity is inadequate. When Yamaha 
came out with the CS-80 in 1977 it pioneered after-touch on ev¬ 
ery key and laid the foundations for a class of ‘extended keyboards’ 
such as the Haken Continuum [1], McPherson’s TouchKeys [2] and 
the Seaboard [3] by Roli. All these instruments make multiple pa¬ 
rameters per key available continuously. A standardization effort 
of these parameter streams lead to the MIDI Polyphonic Expression 
(MPE) specification. Jones’ Soundplane [4] and Linn’s Linnstrument 
likewise belong to this group of instruments but do away with the 
traditional (and some may say reactionary) piano key layout. 

1.2. Exciting audio 

A full audio signal is offering even more expression compared to just 
control-rate parameters. Therefore, contact microphones (piezoelec¬ 
tric sensors) have become a staple of electro-acoustic exploration. 
They have also found their way in commercial music instruments, 
but mostly as cheap threshold trigger pads delivering way below their 
potential. Only a handful of commercially available instruments, 
namely Korg’s Wavedrum (1994), Zamborlin’s Mogees [5] (2014) 
and the ATV aframe [6] (2017) have put them to much more ade¬ 
quate use by feeding the excitation signal into a digital resonator. In 
the context of research a variety of implementations for experimental 


and affordable instruments with acoustic interfaces have been pro¬ 
posed. From ceramic tiles as a source for percussive sounds [7], to 
acrylic sheets instead of guitar strings [8], [9] or intricate prototypes 
with vibration insulated pads for eight fingers [10]. 

1.3. Marrying control and exciter 

Miller’s tiles [7] and Momeni’s Caress [10] consider the process¬ 
ing of the contact microphone as sufficiently expressive. Cook’s 
Nukulele [11] combines two sensors, one at audio rate and one at 
control rate, to create the affordance of an Ukulele which is played 
with both hands on different positions of the instrument. As one 
would with a guitar, a hand controls the parameters while the other 
provides an excitation signal. Former is the control rate input and 
latter the audio rate input. 

The Kazumi by Zayas is an instrument which combines capaci¬ 
tive sensing and piezoelectric microphones on the same surface [12]. 
It features seven separate faces in a prismatic heptagonal shape. Each 
of the faces has a copper capacitive sensing layer which divides it 
into six areas from bottom to tip, combined with a piezo mic under¬ 
neath. 

We want to augment the sound signal with additional parame¬ 
ters, so we simultaneously track the position of touch on the surface. 
This way we make a second hand for generating parameters obso¬ 
lete. (Figure 1) Our implementation creates a percussive instrument 
which can be hit, but also can be melodic and played in continuous 
gestures by rubbing, scratching, or bowing on its edge. 
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Figure 1: Hybridity of audio and control data 


1.4. Instrument versus controller 

Great effort has been put into abstracting controller hardware to be¬ 
come universal input devices for software instruments. The generic 
controller is an interface to change parameters on the synthesizer in 
which the actual sound is generated. In our instrument it’s not so 
easy to define where the controller ends, and the instrument starts. 
Cook writes that “...many of the striking lessons from our history 
of intimate expressive musical instruments lie in the blurred bound¬ 
aries between player, controller, and sound producing object.” [11]. 
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In our instrument we are using the actual audio signal from the sur¬ 
face which then is fed into a digital filter on the computer. In effect, 
a significant component of the final sound is defined by the spectrum 
and gesture of the excitation signal. While in the literature the term 
‘hybrid controller’ is found [9] we prefer to describe the Tickle as 
an ‘acoustic interface’. In our opinion ‘hybridity’ is too generic and 
there is no declaration of its components, while ‘acoustic interface’ 
adds clarity to its nature. 

2. THE TICKLE 

The following section describes the components of the instrument. 

2.1. Hardware 

The case is made of bent steel with wooden side panels. Its top sur¬ 
face is a printed circuit board and has a capacitive touchpad, three 
endless rotary encoders with associated RGB LED and up/down but¬ 
tons (for transposition or other parameters). On the back are six 
ports: 

1. External in (if plugged-in it mutes the built-in sensor) 

2. CV out Y axis (0-4 V) 

3. CV out X axis (0-4 V) or note 

4. Host (micro-USB port) 

5. Gate or envelope (0-5 V) 

6. Excitation (audio signal) 

2.2. Surface 

After a brief evaluation of piano key layouts and variations thereof 

[13] it was concluded, that a piano key layout is contradictory to 
the intended interaction with the instrument. A hexagon pattern 
was chosen to have equal distanced and sized* segmentation with¬ 
out empty spaces on the surface. It is also found in other electronic 
instruments and controllers, for example, the Synderphonics Manta 

[14] . From the 8-Bit resolution in X and Y axis we can calculate in 
which of the 14 hexagons printed on the surface a touch occurred. 
The capacitive touch sensing is single-touch, so polyphony cannot 
be achieved by simultaneous touches. A two or more point gesture 
will produce erroneous ghosting touch points and thus needs to be 
avoided while playing. However, with voice allocation we can let 
one touch resonate while a new touch gets its own resonator, so sub¬ 
sequent touch events may have overlapping resonances. 

2.3. Material and Texture 

To create an acoustic excitation signal we rely on a hard material that 
captures the spectra of different gestures. In addition to the rigidity 
of the material, a textured surface is essential to create enough noise 
when rubbed and wiped. Silicone surfaces are not suitable for our 
application since they absorb too much of the subtle interaction. 

2.4. Residual and Resonance 

Generally, we want the physical surface of the instrument to resonate 
as little as possible, so that we can feed the dry residual signal of the 
touch gesture (rub, scratch, hit, flick, bow etc.) as excitation sig¬ 
nal into a digital resonator (See also [7]). This way the full power 

'except for the hexagons at the edges 


of physical modeling synthesis algorithms may be accessed. The 
practice of sending generated noise-bursts or clicks into digital res¬ 
onators which can be found in literature for physical modeling and 
which is still the standard in many soft- and hardware implementa¬ 
tions is crippling the true potential of such algorithms. 

2.5. Synthesis 

For the sound synthesis we employ techniques of digital reverbrators 
which at their heart are delay lines, feedback and filters. They can 
be understood as modeled simulations (waveguides and mass-spring 
models) of the physics happening in real instruments as described 
by Smith [15]. These models can be generated with Berdahl and 
Smith’s Synth-A-Modeler compiler [16] which has received a graphi¬ 
cal interface with Vasil’s SaM-Designer [17]. Synth-A-Modeler gen¬ 
erates Faust code which can be compiled in a variety of other formats 
such as a Pure Data external. With the Pure Data object pmpd"' 
from Henry’s PMPD [18] library which creates static mass and spring 
models, we achieved nice sounding string, plate, and gong topolo¬ 
gies. However, we are not aiming for perfect recreations of classic 
instruments, our interest lies in the exploration of synthetic sounds 
with an acoustic and intimate level of control. Algorithms such as the 
nested comb filter delay as described by Ahn and Dudas [19] prove 
interesting and fun to interpret with our instrument while being sur¬ 
prisingly cheap to compute. We can employ our acoustic interface to 
excite extended, hybrid and abstract cyberinstruments as described 
by Kojs et al. [20]. Convolution methods with samples can be useful 
to digital Foley artists to articulate a sample in a plenitude of varia¬ 
tions. 



2.6. Software Architecture and Code 

Our hardware is based on a Cypress PSoC 5 microprocessor and runs 
a firmware which is digitizing the capacitive sensing surface and the 
signal from the piezoelectric sensor. It communicates to a custom 
kernel driver which is then communicating to user-space software 
like our Pure Data external or a VST-plugin. Our kernel driver for 
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Linux as well as the Pure Data external are published under a free li¬ 
cense. A repository of the source^ is available (mirrored on github^). 

2.7. Drivers and Communication 

A great challenge was to transmit control rate signals married to a 
stream of audio with a stable latency and reliable offset to each other. 
The capacitive sensing reports every 4 ms a position while the audio 
streams with a sample rate of 48 kHz and a block size of 64 samples. 
Currently the user-space software is expected to match these settings 
to work reliably. We wrote our own Linux kernel driver receiving 
this isochronous stream of control and audio rate signals via USB 
from the device. 

3. STANDARDS FOR TRANSMISSION AND STORAGE 

We believe that acoustic interfaces will soon become a category of 
their own and manufacturers will introduce hybrid controllers to the 
market. To make these new devices work with synthesis software 
there will have to be a standardization effort for interoperability. 
McMillen and Thew published a proposal on how to send sound 
spectrum information over MIDI and OSC [21]. However many ques¬ 
tions are yet to be answered about which format and standard should 
be used for audio and data. A plethora of further questions arise 
when thinking about a possible integration of a track with control 
and audio-as-synthesis-source into a DAW. With this publication and 
the open source driver we wish to start a discussion about possible 
open standards for transmission, storage, and integration of analogue 
interfaces into the creative workflow of musicians. 

3.1. Specifications for the Driver 

Our aim is an isochronous transfer of data and audio rate signals 
with minimal latency, and more importantly, with little jitter [22]. 
The touch position data needs to be present before the audio arrives 
to be able to tune the synthesis. There can’t be any variation to the 
offset between signal and data. The audio stream doesn’t need to be 
continuous; it could start on the touch event and end with it. In a fu¬ 
ture polyphonic version, several audio streams could exist in parallel. 
The implementation could be a data protocol with (multichannel-) 
audio streaming segments on demand, as well as an continuous au¬ 
dio stream with additional data interwoven. The touch events should 
refer to a specific sample in the audio, possibly with a timestamp. 
Other interface data like extra knobs, faders, potentiometers or ro¬ 
tary encoders don’t need this precision in timing. 

3.2. Surveyed Communication Protocols 

We’ve considered different established and experimental protocols. 
Each was evaluated against the aforementioned goals. 

1. A kernel module driver was our choice, as it gives us the 
maximum amount of control to make sure it meets our crite¬ 
ria. However, it needs an installation procedure. On Windows 
and Mac OS the operating system vendor restricts who can 
distribute kernel modules, in fact we have paid Apple and ap¬ 
plied for kernel signing and are still waiting for any response 
after 5 months. On Linux, Secure Boot needs to be deac¬ 
tivated or the kernel extension manually signed. A custom 


kernel driver means additional development overhead and for 
the customer the fear that the device will be rendered useless 
if support ends. 

2. Audio spectrum data (via midi or osc). Another approach 
would be to break down the audio into metadata and then send 
this over established protocols like MIDI or OSC which would 
allow for a partial reconstruction. This was proposed in the 
aforementioned draft by McMillen [21]. We dismissed this 
approach because we see it as necessary to include a full audio 
stream to reduce the latency required for the analysis of such 
descriptive meta information. It also creates a computational 
overhead on both, the sending and receiving device. 

3. Audio and MIDI Class Compliant drivers are a viable al¬ 
ternative. It’s possible to use one USB connection providing 
two virtual devices, an audio interface, and a HID or a MIDI 
device. Using standards means compatibility, no driver in¬ 
stalls and continuous support. However, it’s not guaranteed 
that latency and offset will be consistent. Another problem 
lies in limitations of popular proprietary DAWs like Ableton 
Live, which will only allow the use of one sound card at a 
time. Assuming that the sound synthesis happens in a plugin 
of the DAW, this restriction would block the plugin to access 
the audio device. 

4. Control Data as Audio Signal. Control data may be sent as 
signals at audio rate, not unlike control voltage in synthesizers 
or upsampled sensor output in Wessel’s Slabs, which features 
96 channels of audio [23]. It could also be encoded as fre¬ 
quencies and later be decoded with a Fourier transformation 
like the Nuance as described in Michon et al. [24]. 

5. MIDI 2.0 There is no indication that MIDI 2.0, which is cur¬ 
rently in prototyping stage at the MIDI Manufacturers As¬ 
sociation, will include the feature to send audio streams for 
acoustic interfaces. 

This list claims no completeness, for example we have not sur¬ 
veyed protocols like Ultranet or AVB. It’s likely we have overlooked 
something and there may be a sensible solution to our problem al¬ 
ready available. 

4. FUTURE WORK 

Future research may be conducted to implement the following fea¬ 
tures to the instrument: 1. Multi-touch to relieve from ghosting 
issues when two fingers touch the surface simultaneously. It also 
allows for polyphony later on. 2. Pressure sensing [25] either for 
every point or at least globally for the whole surface. 3. Haptic feed¬ 
back is challenging to implement due to the feedback into the sensor, 
but can give the user a much more intense sense of reality. The Lofelt 
Basslet[2C\ is a good demonstration of such a device. 4. Integrated 
sound synthesis either implemented by a) analog circuitry or b) an 
embedded computing platform, for example the Bela board [27]. 5. 
Playful interfaces to manipulate mass-spring models in real-time as 
seen in Allen’s Ruratae [28]. 


^Source code: https://gitlab.chair.audio/explore/projects 
^Github mirror: https://github.com/chairaudio 
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Figure 3: One of the more unconventional and unintended ways to 
play the Tickle 


5. CONCLUSIONS 

Our instrument Tickle combines several well-known techniques and 
technologies which on their own are not new. Touch pad, contact 
microphone, and physical modeling synthesis have been around for 
decades. However, in their combination they synergize to a power¬ 
ful intuitive instrument which allows for a natural and intimate [29] 
interaction with precise and reproducible control over sound. Feed¬ 
ing an analogue excitation signal into a (digital) resonator can cre¬ 
ate familiar as well as alien sounds. Sounds which either behave 
like instruments we know: Violin, guitar, snare drum, cymbal, gong, 
marimba, etc., or sounds which are distinctly synthetic but have an 
analogue touch to it."^ 

With this paper we hope to have shown the necessity of sam¬ 
ple accurate, low latency and jitter free communications for acoustic 
interfaces and started a discussion on how to achieve it. 
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playlist of video demonstrations with the instrument can be found on 
our website https://chair.audio 
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